Conference Proceedings
Generating custom classification datasets by targeting the instance space
MA Muñoz, K Smith-Miles
Gecco 2017 Proceedings of the Genetic and Evolutionary Computation Conference Companion | ASSOC COMPUTING MACHINERY | Published : 2017
Abstract
While machine learning has evolved at a fast pace in the last decades, the testing procedure of new methods may be not keeping pace. It often relies on well-studied collections of classification datasets such as the UCI repository. However, a meta-Analysis through features has showed that most datasets from UCI are not suffciently challenging to expose unique weaknesses of algorithms. In this paper we present a method to generate datasets with continuous, binary and categorical attributes, through the fitting of a Gaussian Mixture Model and a set of generalized Bernoulli distributions. By targeting empty areas of the instance space, this method has the potential to generate datasets with mor..
View full abstractGrants
Awarded by Australian Research Council
Funding Acknowledgements
This work is funded by the Australian Research Council through the Australian Laureate Fellowship FL140100012. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research. We also thank Dr. Toan Nguyen from Monash University's eResearch team, who implemented parallelized versions of the meta-features routines, resulting in large reductions in the overall computation time.